Goto

Collaborating Authors

 gegenbauer polynomial






Learning single-index models via harmonic decomposition

Joshi, Nirmit, Koubbi, Hugo, Misiakiewicz, Theodor, Srebro, Nathan

arXiv.org Machine Learning

We study the problem of learning single-index models, where the label $y \in \mathbb{R}$ depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown one-dimensional projection $\langle \boldsymbol{w}_*,\boldsymbol{x}\rangle$. Prior work has shown that under Gaussian inputs, the statistical and computational complexity of recovering $\boldsymbol{w}_*$ is governed by the Hermite expansion of the link function. In this paper, we propose a new perspective: we argue that "spherical harmonics" -- rather than "Hermite polynomials" -- provide the natural basis for this problem, as they capture its intrinsic "rotational symmetry". Building on this insight, we characterize the complexity of learning single-index models under arbitrary spherically symmetric input distributions. We introduce two families of estimators -- based on tensor unfolding and online SGD -- that respectively achieve either optimal sample complexity or optimal runtime, and argue that estimators achieving both may not exist in general. When specialized to Gaussian inputs, our theory not only recovers and clarifies existing results but also reveals new phenomena that had previously been overlooked.


Fractal and Regular Geometry of Deep Neural Networks

Di Lillo, Simmaco, Marinucci, Domenico, Salvi, Michele, Vigogna, Stefano

arXiv.org Machine Learning

We study the geometric properties of random neural networks by investigating the boundary volumes of their excursion sets for different activation functions, as the depth increases. More specifically, we show that, for activations which are not very regular (e.g., the Heaviside step function), the boundary volumes exhibit fractal behavior, with their Hausdorff dimension monotonically increasing with the depth. On the other hand, for activations which are more regular (e.g., ReLU, logistic and $\tanh$), as the depth increases, the expected boundary volumes can either converge to zero, remain constant or diverge exponentially, depending on a single spectral parameter which can be easily computed. Our theoretical results are confirmed in some numerical experiments based on Monte Carlo simulations.


A Physics-Informed Machine Learning Approach for Solving Distributed Order Fractional Differential Equations

Aghaei, Alireza Afzal

arXiv.org Artificial Intelligence

This paper introduces a novel methodology for solving distributed-order fractional differential equations using a physics-informed machine learning framework. The core of this approach involves extending the support vector regression (SVR) algorithm to approximate the unknown solutions of the governing equations during the training phase. By embedding the distributed-order functional equation into the SVR framework, we incorporate physical laws directly into the learning process. To further enhance computational efficiency, Gegenbauer orthogonal polynomials are employed as the kernel function, capitalizing on their fractional differentiation properties to streamline the problem formulation. Finally, the resulting optimization problem of SVR is addressed either as a quadratic programming problem or as a positive definite system in its dual form. The effectiveness of the proposed approach is validated through a series of numerical experiments on Caputo-based distributed-order fractional differential equations, encompassing both ordinary and partial derivatives.


Gegenbauer Graph Neural Networks for Time-varying Signal Reconstruction

Castro-Correa, Jhon A., Giraldo, Jhony H., Badiey, Mohsen, Malliaros, Fragkiskos D.

arXiv.org Artificial Intelligence

Reconstructing time-varying graph signals (or graph time-series imputation) is a critical problem in machine learning and signal processing with broad applications, ranging from missing data imputation in sensor networks to time-series forecasting. Accurately capturing the spatio-temporal information inherent in these signals is crucial for effectively addressing these tasks. However, existing approaches relying on smoothness assumptions of temporal differences and simple convex optimization techniques have inherent limitations. To address these challenges, we propose a novel approach that incorporates a learning module to enhance the accuracy of the downstream task. To this end, we introduce the Gegenbauer-based graph convolutional (GegenConv) operator, which is a generalization of the conventional Chebyshev graph convolution by leveraging the theory of Gegenbauer polynomials. By deviating from traditional convex problems, we expand the complexity of the model and offer a more accurate solution for recovering time-varying graph signals. Building upon GegenConv, we design the Gegenbauer-based time Graph Neural Network (GegenGNN) architecture, which adopts an encoder-decoder structure. Likewise, our approach also utilizes a dedicated loss function that incorporates a mean squared error component alongside Sobolev smoothness regularization. This combination enables GegenGNN to capture both the fidelity to ground truth and the underlying smoothness properties of the signals, enhancing the reconstruction performance. We conduct extensive experiments on real datasets to evaluate the effectiveness of our proposed approach. The experimental results demonstrate that GegenGNN outperforms state-of-the-art methods, showcasing its superior capability in recovering time-varying graph signals.


An Equivalence Principle for the Spectrum of Random Inner-Product Kernel Matrices with Polynomial Scalings

Lu, Yue M., Yau, Horng-Tzer

arXiv.org Machine Learning

We investigate random matrices whose entries are obtained by applying a nonlinear kernel function to pairwise inner products between $n$ independent data vectors, drawn uniformly from the unit sphere in $\mathbb{R}^d$. This study is motivated by applications in machine learning and statistics, where these kernel random matrices and their spectral properties play significant roles. We establish the weak limit of the empirical spectral distribution of these matrices in a polynomial scaling regime, where $d, n \to \infty$ such that $n / d^\ell \to \kappa$, for some fixed $\ell \in \mathbb{N}$ and $\kappa \in (0, \infty)$. Our findings generalize an earlier result by Cheng and Singer, who examined the same model in the linear scaling regime (with $\ell = 1$). Our work reveals an equivalence principle: the spectrum of the random kernel matrix is asymptotically equivalent to that of a simpler matrix model, constructed as a linear combination of a (shifted) Wishart matrix and an independent matrix sampled from the Gaussian orthogonal ensemble. The aspect ratio of the Wishart matrix and the coefficients of the linear combination are determined by $\ell$ and the expansion of the kernel function in the orthogonal Hermite polynomial basis. Consequently, the limiting spectrum of the random kernel matrix can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law. We also extend our results to cases with data vectors sampled from isotropic Gaussian distributions instead of spherical distributions.


Spectrum of inner-product kernel matrices in the polynomial regime and multiple descent phenomenon in kernel ridge regression

Misiakiewicz, Theodor

arXiv.org Machine Learning

We study the spectrum of inner-product kernel matrices, i.e., $n \times n$ matrices with entries $h (\langle \textbf{x}_i ,\textbf{x}_j \rangle/d)$ where the $( \textbf{x}_i)_{i \leq n}$ are i.i.d.~random covariates in $\mathbb{R}^d$. In the linear high-dimensional regime $n \asymp d$, it was shown that these matrices are well approximated by their linearization, which simplifies into the sum of a rescaled Wishart matrix and identity matrix. In this paper, we generalize this decomposition to the polynomial high-dimensional regime $n \asymp d^\ell,\ell \in \mathbb{N}$, for data uniformly distributed on the sphere and hypercube. In this regime, the kernel matrix is well approximated by its degree-$\ell$ polynomial approximation and can be decomposed into a low-rank spike matrix, identity and a `Gegenbauer matrix' with entries $Q_\ell (\langle \textbf{x}_i , \textbf{x}_j \rangle)$, where $Q_\ell$ is the degree-$\ell$ Gegenbauer polynomial. We show that the spectrum of the Gegenbauer matrix converges in distribution to a Marchenko-Pastur law. This problem is motivated by the study of the prediction error of kernel ridge regression (KRR) in the polynomial regime $n \asymp d^\kappa, \kappa >0$. Previous work showed that for $\kappa \not\in \mathbb{N}$, KRR fits exactly a degree-$\lfloor \kappa \rfloor$ polynomial approximation to the target function. In this paper, we use our characterization of the kernel matrix to complete this picture and compute the precise asymptotics of the test error in the limit $n/d^\kappa \to \psi$ with $\kappa \in \mathbb{N}$. In this case, the test error can present a double descent behavior, depending on the effective regularization and signal-to-noise ratio at level $\kappa$. Because this double descent can occur each time $\kappa$ crosses an integer, this explains the multiple descent phenomenon in the KRR risk curve observed in several previous works.